-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve uri.parseQuery to never raise an error #16647
Conversation
probably correct but please give some links (eg spec, stackoverflow etc) in your PR description to double check; tests need fixing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change. But also whether to raise the error should be customisable.
Please introduce a strict_parsing
flag like in Python's urllib: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs (it should be true
by default to keep backwards compatibility)
which code would break? |
@timotheecour As an example, there may be code out there which relies on being able to check for validity of the query string and explicitly checks for a Are there any downsides to keeping backwards compatibility here? |
if the spec (or most libraries) allows |
I found some references for parsing query strings (and checked them):
|
Updated code with @timotheecour requested changes. Add compilation flag to restore old behaviour suggested. I too prefer the new behaviour by default, it is unlikely to cause breaks in existing apps and the new behaviour is better I believe. it is also conforming to standards and to the general principle of network programming : be liberal in what you accept. On the contrary, web apps written currently in Nim are probably not handling this specific case of query string parsing, and the new behavior can avoid an error that could in some cases provoke a denial of service. |
0382591
to
28b1e9c
Compare
Rebase error? Did you rebase against master instead of devel or something? |
28b1e9c
to
a296b4f
Compare
Yep, sorry, updated |
else: | ||
if c == sep: break | ||
else: add(field, data[result]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | |
if c == sep: break | |
else: add(field, data[result]) | |
elif c == sep: break | |
else: add(field, data[result]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the else of the case statement, I'm not convinced that elif is a valid option for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's valid, and it works, i verified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after addressing comments + fixing test failures in tests/stdlib/turi.nim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Looking into this a bit more, the specific example we are getting to work here isn't to do with strict parsing, this is in fact fixing a bug in the current parser. Python will parse it fine even with strict parsing enabled:
>>> import urllib.parse
>>> urllib.parse.parse_qs('foo=1&bar=2=3')
{'foo': ['1'], 'bar': ['2=3']}
>>> urllib.parse.parse_qs('foo=1&bar=2=3', strict_parsing=True)
{'foo': ['1'], 'bar': ['2=3']}
But then why are you changing it to "never raise an error"? Just fix that bug and don't get rid of all errors please. That way this isn't a breaking change.
which other errors? can you show an input where you expect an error? |
@dom96: I don't understand what you are suggesting... Should we (A) fix the parser to accept the If we want to be compliant with HTML5 spec, we need to change the behaviour and this is breaking to library users that expects an exception in this special case. If we want to keep raising an exception, we can't be compliant to the HTML spec. Note: the |
a296b4f
to
be3066d
Compare
Updated the code. I'm not convinced that the CI errors comes from my code, errors seems to come from fidget and PackageFileParsed (whatever those are) and I don't really see the problem looking at the CI logs... |
Looking at the Python implementation it appears this is a case where an exception will be raised with strict parsing on: >>> urllib.parse.parse_qs('&q', strict_parsing=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/urllib/parse.py", line 676, in parse_qs
max_num_fields=max_num_fields)
File "/usr/lib/python3.6/urllib/parse.py", line 729, in parse_qsl
raise ValueError("bad query field: %r" % (name_value,))
ValueError: bad query field: ''
>>> urllib.parse.parse_qs('&q', strict_parsing=False)
{} And you get an exception with decodeQuery for this too. Another thing to keep in mind here is that cgi makes use of this code, and explicitly checks for exceptions, see the diff that moved this iterator: 6895040. Are we sure this isn't breaking CGI? @xflywind any thoughts? |
be3066d
to
a7de066
Compare
@dom96 You're right, As for strict checking, I believe this is not the same thing. Is there a use case for struct checking? Nor all languages advertise such a feature. In any case I believe this would belong to another PR and should be clearly defined before any implementation is attempted. |
55a5deb
to
4cabada
Compare
4cabada
to
f3d59a1
Compare
Okay, so yes you can make these changes. This code is actually brand new and hasn't been released yet in a Nim version (it was only added 16 days ago), so you don't need to add it to the changelog, nor do you need those The only concern I have is with any breakage in cgi, I can see that a test was added to check that the exception is raised correctly in 6895040. As for |
Well, actually the code in uri right now comes from cgi, it was just moved over. Perhaps the compatibility option should be renamed to refer to cgi instead. I'm not entirely sure it is needed though. Also, I'm not sure that there is a need for a strict parsing as this does not seems to be a well defined behavior. There is no common implementation of such a strict mode across all languages and I'm not sure copying the Python behavior is of any use on its own. It looks like unneeded complexity for me. |
@@ -96,6 +96,13 @@ | |||
with other backends. see #9125. Use `-d:nimLegacyJsRound` for previous behavior. | |||
- Added `socketstream` module that wraps sockets in the stream interface | |||
|
|||
- Changed the behavior of `uri.decodeQuery` when there are unencoded `=` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that the CI errors comes from my code, errors seems to come from fidget and PackageFileParsed (whatever those are) and I don't really see the problem looking at the CI logs...
https://dev.azure.com/nim-lang/255dfe86-e590-40bb-a8a2-3c0295ebdeb1/_apis/build/builds/12009/logs/84
2021-01-11T23:28:40.1530530Z �[1m�[31mFAIL: �[36mtests/stdlib/turi.nim c�[0m
...
2021-01-11T23:28:40.1541230Z ../../lib/system/fatal.nim(53, 5) Error: unhandled exception: expected raising 'UriParseError', instead nothing was raised by:
2021-01-11T23:28:40.1542060Z discard toSeq(decodeQuery("a=1&b=2c=6")) [AssertionDefect]
it comes from this PR, the fix is trivial:
in tests/stdlib/turi.nim, adapt:
doAssertRaises(UriParseError):
discard toSeq(decodeQuery("a=1&b=2c=6"))
accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I was looking at the other CI runs where this did not appear, my bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem; please click "resolve conversation" for any resolved comment (unless it contains pushback)
which of the languages you mentioned in #16647 (comment) define some notion of strict parsing? I'm fine without having such a "strict parsing" option, but if lots of languages have such a notion, then maybe we can reconsider (in future work) |
Of those languages, only Python seems to have strict parsing. Ruby does not neither does Go. Same for the extra options which are only available in Python. |
In case of malformed query string where there is `=` on the value, handle this character as part of the value instead of throwing an error. The following query string should no longer crash a program: key=value&key2=x=1 It will be interpreted as [("key", "value"), ("key2", "x=1")] This is correct according to latest WhatWG's HTML5 specification recarding the urlencoded parser: https://url.spec.whatwg.org/#concept-urlencoded-parser Older behavior can be restored using the -d:nimLegacyParseQueryStrict flag.
f3d59a1
to
f9402d1
Compare
ok, that's convincing. No need to support a notion of strict parsing then. |
Looks really good this way. Follup PRs can address potential shortcomings that I've missed. |
In case of malformed query string where there is `=` on the value, handle this character as part of the value instead of throwing an error. The following query string should no longer crash a program: key=value&key2=x=1 It will be interpreted as [("key", "value"), ("key2", "x=1")] This is correct according to latest WhatWG's HTML5 specification recarding the urlencoded parser: https://url.spec.whatwg.org/#concept-urlencoded-parser Older behavior can be restored using the -d:nimLegacyParseQueryStrict flag.
## Reads and decodes CGI data and yields the (name, value) pairs the | ||
## data consists of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decodeData
should document nimLegacyParseQueryStrict
option too.
And even with
-d:nimLegacyParseQueryStrict
, it never raisesCgiError
now. It actually raisesUriParseError
.
ref #19308 (comment)
In case of malformed query string where there is
=
on the value, handlethis character as part of the value instead of throwing an error.
The following query string should no longer crash a program:
It will be interpreted as: